Fast semi-supervised discriminant analysis for binary classification of large data-sets

نویسندگان

  • Joris Tavernier
  • Jaak Simm
  • Karl Meerbergen
  • Jörg K. Wegner
  • Hugo Ceulemans
  • Yves Moreau
چکیده

High-dimensional data requires scalable algorithms. We propose and analyze three scalable and related algorithms for semi-supervised discriminant analysis (SDA). These methods are based on Krylov subspace methods which exploit the data sparsity and the shift-invariance of Krylov subspaces. In addition, the problem definition was improved by adding centralization to the semi-supervised setting. The proposed methods are evaluated on a industry-scale data set from a pharmaceutical company to predict compound activity on target proteins. The results show that SDA achieves good predictive performance and our methods only require a few seconds, significantly improving computation time on previous state of the art.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Latent Fisher Discriminant Analysis

Linear Discriminant Analysis (LDA) is a well-known method for dimensionality reduction and classification. Previous studies have also extended the binary-class case into multi-classes. However, many applications, such as object detection and keyframe extraction cannot provide consistent instance-label pairs, while LDA requires labels on instance level for training. Thus it cannot be directly ap...

متن کامل

Fisher Discriminant Analysis (FDA), a supervised feature reduction method in seismic object detection

Automatic processes on seismic data using pattern recognition is one of the interesting fields in geophysical data interpretation. One part is the seismic object detection using different supervised classification methods that finally has an output as a probability cube. Object detection process starts with generating a pickset of two classes labeled as object and non-object and then selecting ...

متن کامل

Semi-Supervised Learning with Explicit Misclassification Modeling

This paper investigates a new approach for training discriminant classifiers when only a small set of labeled data is available together with a large set of unlabeled data. This algorithm optimizes the classification maximum likelihood of a set of labeledunlabeled data, using a variant form of the Classification Expectation Maximization (CEM) algorithm. Its originality is that it makes use of b...

متن کامل

Using DEA for Classification in Credit Scoring

Credit scoring is a kind of binary classification problem that contains important information for manager to make a decision in particularly in banking authorities. Obtained scores provide a practical credit decision for a loan officer to classify clients to reject or accept for payment loan. For this sake, in this paper a data envelopment analysis- discriminant analysis (DEA-DA) approach is us...

متن کامل

Variable Selection and Updating in Model-based Discriminant Analysis for High Dimensional Data with Food Authenticity Applications by Thomas

Food authenticity studies are concerned with determining if food samples have been correctly labeled or not. Discriminant analysis methods are an integral part of the methodology for food authentication. Motivated by food authenticity applications, a model-based discriminant analysis method that includes variable selection is presented. The discriminant analysis model is fitted in a semi-superv...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:
  • CoRR

دوره abs/1709.04794  شماره 

صفحات  -

تاریخ انتشار 2017